Getting the data
Delivery data
The actual historical delivery data of each Crop Protection (CP) brand and SKU of each country was extracted with the following SQL query from the Redshift database. This example gives data of India.
select
distinct del.<delivery_id>,
mp.plant_id,
mm.material_id as SKU,
mm.<brand name> as brand,
del.delivery_date,
mpld.product_line,
del.<quantity delivered>
from
<delivery table> del
join <material table> mm on del.material_id = mm.material_id
join <plant table> mp on del.plant_id = mp.plant_id
join <product line table> mpld on mm.uc_productlineowner = mpld.product_line
join <billing table> mbf on del.delivery_number = mbf.delivery_number
join <material type table> mma_geb on mma_geb.material_guid = mm.material_guid
where
del.source_system_cde = <SAP>
and mp.country = 'IN' -- India - Change here to include/remove any country
and not(<plant> IS <obsolete>)
and del.<delivery type> = <GSAP>
and mm.record_type = 'CP' -- Crop Protection products only
AND del.delivery_date > '2016/01/01' -- historical data from 2016 is extracted
AND mpld.<product line> IN <relevant product lines>
and mbf.<billing document number> is not null
and del.<quantity delivered> > 0
group by
1,
2,
3,
4,
5,
6,
7
order by
5
This query would be executed on the 1st day of every month to get the latest delivery data for each country.
Weather data
Weather is an important factor affecting agriculture. Weather data was updated on the first of every month.
Since fetching/updating the weather data was done though company specific tools, it cannot be documented here.
Economic Indicators data
World Bank indicators of the country can also prove to be useful for forecasting. This was fetched with the wb api:
all_indicator_ids = [s['id'] for s in wb.series.list()]
df = wb.data.DataFrame(all_indicator_ids, country_code, time=time_range, labels=True)
df['Series'] = df['Series'].apply(lambda x: re.sub('[^A-Za-z0-9_]+', ' ', x))
df.dropna(how='any', inplace = True)
df = df.set_index('Series').T.reset_index(names = 'year')
df['year'] = df['year'].str.replace('YR', '')
df['year'] = df['year'].astype('int32')
df['year'] = df['year'] + 1
df.to_parquet(f'data/{country}/economic_indicators.parquet', index=False)
This would also be executed on the 1st day of every month to get the latest world bank data for each country.
IBP
IBP forecasts were extracted with the following SQL query from the Redshift database. The following example gives IBP forecasts of India for the year 2023 as on January 1st 2023.
select
mm.material_id as SKU,
mm.<brand name> as brand,
<forecasted period> as date,
sum(<forecasted quantity>),
from
<IBP sales forecast table> sf
join <material table> mm on sf.material_id = mm.material_id -- to get brand name
where
<validity date> <= '2023-01-01'
and <forecasted period> between '2023-01-01' and '2023-12-31' -- Set desired forecast period
and <forecasting customer country> in ('IN') -- Set desired countries
group by
1,
2,
3
The dates would be changed and the query would be executed every month to get the latest forecasts on the 1st of every month. The process was automated in the pipeline.
The material ids of the IBP dataset would be matched with the SKUs of delivery data to filter only CP products.